Whither Linguistic Interpretation of Acoustic Pronunciation Variation
نویسندگان
چکیده
Recent research suggests that modelling pronunciation variation is more appropriate at the syllable level than at the level of contextdependent phones. Due to the large number of factors affecting syllable pronunciation, the creation of multi-path topologies is nec essary. Previous research on multi-path models in connected digit recognition has proved trajectory clustering to be an attractive ap proach to deriving multi-path models. In this paper, we extend our research to large-vocabulary continuous speech recognition (LVCSR) by deriving trajectory clusters for 94 frequent syllables in a 20-hour corpus of Dutch read speech. With multi-path models based on these trajectory clusters, speech recognition performance improves signif icantly. We believe that recognition performance can be improved further by adapting the topologies of the parallel paths. However, the physical properties of the clusters do not provide clues to the most appropriate topology, or the best way of initialising the state observation densities. Therefore, we attempt to interpret the clusters in terms of linguistic and phonetic criteria. The results obtained so far suggest that there is no straightforward relation between physi cally defined trajectory clusters and linguistic and phonetic criteria.
منابع مشابه
Improving speech recognition for children using acoustic adaptation and pronunciation modeling
Developing a robust Automatic Speech Recognition (ASR) system for children is a challenging task because of increased variability in acoustic and linguistic correlates as function of young age. The acoustic variability is mainly due to the developmental changes associated with vocal tract growth. On the linguistic side, the variability is associated with limited knowledge of vocabulary, pronunc...
متن کاملA Syllable Based Approach for Improved Recognition of Spoken Names
Recognition of spoken names is a challenging task for speech recognition systems because of the large variations in speaking styles, linguistic origins and pronunciation found in names. The complex linguistic nature of names makes it difficult to automatically generate pronunciation variations. For many applications the list of names tends to be in the order of several hundred thousands, making...
متن کاملA study of implicit and explicit modeling of coarticulation and pronunciation variation
In this paper, we focus on the modeling of coarticulation and pronunciation variation in Automatic Speech Recognition systems (ASR). Most ASR systems explicitly describe these production phenomena through context-dependent phoneme models and multiple pronunciation lexicons. Here, we explore the potential benefit of using feature spaces covering longer time segments in terms of implicit modeling...
متن کاملModeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition
This paper presents different methods of handling pronunciation variations in Cantonese large-vocabulary continuous speech recognition. In an LVCSR system, three knowledge sources are involved: a pronunciation lexicon, acoustic models and language models. In addition, a decoding algorithm is used to search for the most likely word sequence. Pronunciation variation can be handled by explicitly m...
متن کاملPronunciation Modeling In Speech Synthesis
This dissertation investigates the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling t...
متن کامل